Fujitsu Laboratories Trec8 Report 1 System Description 1.0.1 Tera 2 Common Processing

نویسندگان

Isao Namba

Nobuyuki Igata

چکیده

This year a Fujitsu Laboratory team participated in three tracks:that is ad hoc, small web track, and large web track. As basic techiniques, we compared four popular stemmers, and we made simple removing stop pattern techniques for TREC queries. For the ad hoc task, and small web track, we used the same techiniques. We experimented with area weighting, co-occurence boosting, bi-gram utlization, and reranking by bi-gram extraction from pilot search. The e ect of blind application with those techiniques is rather limited, or even uncertain in the TREC8 experiment. What we can say from TREC8 result is that blind application of co-occurence boosting and area weighting may be e ective for the small web track. They requerie query dependent application. In the large web track, our main interest is efciency, that is how much resources are required to process 100GB of web text and 10000 real web queries in practical time. Using a statistical based language type checker, we can eliminate 23% of nonEnglish text. This leads to speeding up a indexing and reducing the index size. The search speed for an inverted le is CPU intensive if the target machine has main memory in excess of 10-25% of the index size. So with simple, but e ective index compression methods, the throughput of query processing is about 0.54-1.1 query/second even by a single 300MHz Ultra-sparc processor. 1 System Description

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fujitsu Laboratories Trec9 Report 1 System Description 2 Common Processing 2.1 Indexing/query Processing 2.1.1 Indexing Vocabulary 2.1.2 Stemmer 2.1.4 Stop Word List for Query Processing

This year a Fujitsu Laboratory team participated in web tracks. For TREC9 we experimented passage retrieval which is expected to be e ective for Web pages which contain more than one topic. To split document into passages, we used NLP based paragrah detecting program, not by xed (variable) window size. But it did not produce better result for TREC9 Web data. For indexing large web data faster, ...

متن کامل

Fujitsu Laboratories TREC2001 Report

This year a Fujitsu Laboratory team participated in web tracks. Both for ad hoc task, and entry point search task, we combined the score of normal ranking search and that of page ranking techniques. For ad hoc style task, the eect of page ranking was very limitted. We only got very little improvement for title eld search, and the page rank was not eective for description, and narrative eld sear...

متن کامل

Fujitsu Laboratories TREC8 Report - Ad hoc, Small Web, and Large Web Track

متن کامل

Fujitsu Laboratories Trec7 Report 2 System Description 2.1 Overall 2.2 the Search System Tera

In our rst participation in TREC, our focus was on improving the basic ranking systems and applying text clustering techniques for query expansion. We tested a variety of techiniques including reference measures, passage retrieval, and data fusion for the basic ranking systems. Some techiniques were used in the o cial run, others were not used because of time limitations. We applied the text cl...

متن کامل

Controlling Polarization in Quantum-dot Semiconductor Optical Amplifiers

1 Fujitsu Limited and Optoelectronic Industry and Technology Development Association 2 Institute for Nano Quantum Information Electronics (INQIE), The University of Tokyo 3 Fujitsu Limited and Optoelectronic Industry and Technology Development Association 4 Fujitsu Laboratories Limited 5 Department of Electrical and Electronics Engineering, Facility of Engineering, Kobe University 6 Department ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

Fujitsu Laboratories Trec8 Report 1 System Description 1.0.1 Tera 2 Common Processing

نویسندگان

چکیده

منابع مشابه

Fujitsu Laboratories Trec9 Report 1 System Description 2 Common Processing 2.1 Indexing/query Processing 2.1.1 Indexing Vocabulary 2.1.2 Stemmer 2.1.4 Stop Word List for Query Processing

Fujitsu Laboratories TREC2001 Report

Fujitsu Laboratories TREC8 Report - Ad hoc, Small Web, and Large Web Track

Fujitsu Laboratories Trec7 Report 2 System Description 2.1 Overall 2.2 the Search System Tera

Controlling Polarization in Quantum-dot Semiconductor Optical Amplifiers

عنوان ژورنال:

اشتراک گذاری